Data-driven Calibration of Penalties for Least-Squares Regression

نویسندگان

  • Sylvain Arlot
  • Pascal Massart
چکیده

Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from the data. We propose a completely data-driven calibration algorithm for this parameter in the least squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birgé and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a datadriven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a data-driven penalty (the slope heuristics) and proving that it works for penalized least squares random design regression, even for heteroscedastic non-Gaussian data. For some technical reasons, some exact mathematical results will be proved only for regressogram bin-width selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectrophotometric Simultaneous Kinetic Determination of Iodide and Iodate Using Partial Least-Squares Calibration Method in a Single Kinetic Run

A rapid, sensitive and versatile kinetic method is presented for the simultaneous spectrophotometric determination of iodide and iodate by partial least-squares regression (PLS) using original and derivate data named as absorbance and rate data. The method is based on the catalytic effect of the cited anions on the reaction rate between Ce(IV) and As(III) in 2 mol l?1 sulfuric acid medium. The ...

متن کامل

Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression

We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is proportional to the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows’ Cp is suboptimal in this framework, as well as any “linear” penalty depe...

متن کامل

Determination of Protein and Moisture in Fishmeal by Near-Infrared Reflectance Spectroscopy and Multivariate Regression Based on Partial Least Squares

The potential of Near Infrared Reflectance Spectroscopy (NIRS) as a fast method to predict the Crude Protein (CP) and Moisture (M) content in fishmeal by scanning spectra between 1000 and 2500 nm using multivariate regression technique based on Partial Least Squares (PLS) was evaluated. The coefficient of determination in calibration (R2C) and Standard Error of Calibra...

متن کامل

Simultaneous spectrophotometric determination of lead, copper and nickel using xylenol orange by partial least squares

A partial least squares (PLS) calibration model was developed for the simultaneous spectrophotometricdetermination of Pb (ΙΙ), Cu (ΙΙ) and Ni (ΙΙ) using xylenol orange as a chromogenic reagent. The parameterscontrolling behavior of the system were investigated and optimum conditions were selected. The calibrationgraphs were linear in the ranges of 0.0–9.091, 0.0–2.719 and 0.0–2.381 ppm for lead...

متن کامل

Combined Sum of Squares Penalties for Molecular Divergence Time Estimation

Estimates of molecular divergence times when rates of evolution vary require the assumption of a model of rate change. Brownian motion is one such model, and since rates cannot become negative, a log Brownian model seems appropriate. Divergence time estimates can then be made using weighted least squares penalties. As sequences become long, this approach effectively becomes equivalent to penali...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2009